Fix integer overflow in compute_numel()#18598
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18598
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 3 Unrelated FailuresAs of commit 40bfc44 with merge base 0919746 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
0d38e7f to
31330a6
Compare
| InvalidProgram, | ||
| "Overflow computing numel at dim %zu", | ||
| static_cast<size_t>(i)); | ||
| numel *= sizes[i]; |
There was a problem hiding this comment.
It's a double calculation, but ensures we don't abort when calculating numel() from untrusted tensor at parsing time.
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent integer overflow while computing tensor numel, which is used for downstream size calculations (and can otherwise lead to undersized allocations and potential buffer overflows).
Changes:
- Add
ssize_toverflow checks when multiplying tensor dimensions during tensor deserialization. - Add overflow protection inside
compute_numel()used byTensorImpl.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| runtime/executor/tensor_parser_portable.cpp | Adds validation for negative sizes and overflow while computing numel during FlatBuffer tensor parsing. |
| runtime/core/portable_type/tensor_impl.cpp | Adds an overflow check in compute_numel() to prevent ssize_t wrap/UB when multiplying dimensions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Overflow computing numel at dim %zu", | ||
| static_cast<size_t>(i)); | ||
| numel *= sizes[i]; | ||
| } |
There was a problem hiding this comment.
This guards overflow for numel, but tensor_impl->nbytes() later multiplies numel by elementSize(scalar_type); that multiplication can still overflow size_t and wrap, leading to undersized allocation/reads. Consider adding a byte-size overflow check here (or in TensorImpl::nbytes()) using SIZE_MAX / elementSize(scalar_type) to ensure the computed nbytes is safe.
| } | |
| } | |
| // Also ensure that the total number of bytes fits in size_t when | |
| // multiplied by the element size for this scalar type. | |
| const size_t element_size = elementSize(scalar_type); | |
| ET_CHECK_OR_RETURN_ERROR( | |
| numel == 0 || | |
| static_cast<size_t>(numel) <= SIZE_MAX / element_size, | |
| InvalidProgram, | |
| "Overflow computing tensor nbytes"); |
| "Overflow computing numel: %zd * %zd would overflow ssize_t at dimension %zd", | ||
| numel, | ||
| static_cast<ssize_t>(sizes[i]), | ||
| i); |
There was a problem hiding this comment.
We should remove ET_CHECK_MSG, this has a large blast radius here though (see PR summary).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static_cast<ssize_t>(sizes[i]), | ||
| i); | ||
| ET_CHECK_MSG( | ||
| sizes[i] == 0 || numel <= (ssize_t)(SIZE_MAX / 2) / sizes[i], |
There was a problem hiding this comment.
Can you just use the same overflow c10 check we use other places?
compute_numel() multiplies tensor dimensions without overflow protection. The result is used for size calculations in make_tensor_ptr() and clone_tensor_ptr(), so an overflow could lead to undersized allocations and subsequent buffer overflows. Add an ET_CHECK_MSG before each multiplication to verify that numel * sizes[i] will not exceed SSIZE_MAX. This PR was authored with the assistance of Claude.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include <algorithm> | ||
| #include <climits> | ||
| #include <cstdint> | ||
|
|
||
| #include <c10/util/irange.h> | ||
| #include <c10/util/safe_numerics.h> |
There was a problem hiding this comment.
<climits> appears unused in this translation unit. If you need bounds for ssize_t, prefer <limits> + std::numeric_limits<ssize_t>; otherwise, drop the include to reduce churn.
|
@lucylq has imported this pull request. If you are a Meta employee, you can view this in D102049291. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ET_CHECK_MSG( | ||
| !c10::mul_overflows(numel, static_cast<ssize_t>(sizes[i]), &next_numel), | ||
| "Overflow computing numel: %zd * %zd would overflow ssize_t at dimension %zd", | ||
| numel, | ||
| static_cast<ssize_t>(sizes[i]), | ||
| i); |
There was a problem hiding this comment.
The overflow ET_CHECK_MSG uses %zd for the dimension index, but i comes from c10::irange(dim) (typically an int64_t). With printf-style formatting this mismatch is undefined behavior on some platforms. Cast i to ssize_t (or switch to the correct format macro/specifier) when passing it to the format string.
| #include <executorch/runtime/core/portable_type/tensor_impl.h> | ||
|
|
||
| #include <algorithm> | ||
| #include <climits> |
There was a problem hiding this comment.
#include <climits> was added but doesn't appear to be used in this file. Consider removing it to avoid unnecessary dependencies.
| #include <climits> |
| ssize_t next_numel; | ||
| ET_CHECK_OR_RETURN_ERROR( | ||
| !c10::mul_overflows(numel, static_cast<ssize_t>(sizes[i]), &next_numel), | ||
| InvalidProgram, | ||
| "Overflow computing numel at dim %zu", | ||
| static_cast<size_t>(i)); | ||
| numel = next_numel; |
There was a problem hiding this comment.
Similar to TensorImpl::compute_numel(): this calls c10::mul_overflows with signed ssize_t. On MSVC the fallback implementation multiplies signed values directly, which can be UB when overflow occurs. Since tensor sizes are validated as non-negative here, use an unsigned accumulator for the overflow check (and separately ensure the final product fits in ssize_t) to make the check well-defined across compilers.
|
causes some internal failures. Let's rely on D98148157 |
compute_numel() multiplies tensor dimensions without overflow protection. The result is used for size calculations in make_tensor_ptr() and clone_tensor_ptr(), so an overflow could lead to undersized allocations and subsequent buffer overflows.
This PR was authored with the assistance of Claude.
--
TensorImpl ctor can't return an error. To avoid aborting when calculating numel (because of overflow), we have to introduce a new ctor that takes numel as a precomputed value.
The blast radius is a bit large. I think we can do this for now and tackle the ET_CHECK_MSG removal across the codebase separately.